Data retrieval device

ABSTRACT

The data retrieval device includes a first skip correspondence table which corresponds to each piece of data in a retrieval target data series, and, for each possible similarity range which is taken by a similarity between corresponding data and retrieval data, records skip destination data information for specifying the data which appears first after the corresponding data among pieces of data in which similarities with the retrieval data have the possibility to have a predetermined relationship in comparison with a predetermined threshold. The data retrieval device also includes a control unit which, when retrieving data in which a similarity with the retrieval data is smaller than or equal to the threshold from among the retrieval target data series, selects data in the retrieval target data series for which calculation of a similarity with the retrieval data is necessary, using the first skip correspondence table.

TECHNICAL FIELD

The present invention relates to data retrieval devices, and inparticular, to data retrieval devices for retrieving data similar toretrieval data from among a retrieval target data series.

BACKGROUND ART

A typical method of retrieving data similar to retrieval data from amonga retrieval target data series, including video data and audio datastored in a storage device, includes calculating similarities betweenthe retrieval data and all pieces of data in the retrieval target dataseries, and comparing them with a threshold. However, as the amount ofcalculation for similarities between pieces of data is generally large,the above method in which similarities between the retrieval data andall pieces of data in the retrieval target data series must becalculated needs a long time for retrieval. As such, some methods forspeeding up this type of retrieval have been proposed.

For example, Patent Document 1 describes, in the background art section,a method of performing retrieval at a high speed in such a manner thatsimilarity calculation is terminated if a similarity exceeds a certainthreshold. Patent Document 1 also proposes a method of calculatingsimilarities between a part of data series and another one of or aplurality of parts as a self similarity table, and using the table toperform retrieval at a high speed.

PRIOR ART DOCUMENT Patent Document

-   Patent Document 1: Japanese Unexamined Patent Publication No.    2005-62555

SUMMARY OF THE INVENTION Problems to be Solved by the Invention

As shown in FIG. 17( a), a retrieval target data series is assumed to bea data series composed of y_(j), y_(j+1), y_(j+2), y_(j+3), y_(j+4),y_(j+5), and the like. In that case, as shown in FIG. 17( b), a selfsimilarity table of the leading data y_(j) is a table containing asimilarity d(y_(j), y_(j+1)) between the data y_(j) and data y_(j+1), asimilarity d(y_(j), y_(j+2)) between the data y_(j) and data y_(j+2), asimilarity d(y_(j), y_(j+3)) between the data y_(j) and data y_(j+3), asimilarity d(y_(j), y_(j+4)) between the data y_(j) and data y_(j+4), asimilarity d(y_(j), y_(j+5)) between the data y_(j) and data y_(j+5),and the like. It is assumed that the value of a similarity takes apositive value of 0 or larger, and that as the value is smaller,similarity is higher.

In the case of retrieving data, in which a similarity with retrievaldata x_(i) is smaller than or equal to a threshold th, from among aretrieval target data series, retrieval using a self similarity table isperformed in the following procedures.

First, a similarity between the retrieval data x_(i) and the data y_(j)is calculated. Assuming that the obtained similarity is D(x_(i), y_(j)),it is determined whether the data y_(j) is similar data or dissimilardata of the retrieval data x_(i) with use of the following Expression 1.As such, if the similarity D(x_(i), y_(j)) is smaller than or equal to athreshold th, the data y_(j) is output as similar data, while if thesimilarity is larger than the threshold th, the data y_(j) is regardedas dissimilar data.

D(x _(i) , y _(j))≦th  [Expression 1]

If the data y_(j) is regarded as dissimilar data, the next data, onwhich similarity calculation with the retrieval data x_(i) is performed,is determined in the following manner. First, a similarity d(y_(j),y_(j+1)) between the data y_(j) and the immediately following datay_(j+1) is obtained from the self similarity table of the data y_(j),and is subtracted from the similarity D(x_(i), y_(j)). Then, thesubtracted result [D(x_(i), y_(j))−d(y_(j), y_(j+1))] is compared withthe threshold th, and if [D(x_(i), y_(j))−d(y_(j), y_(j+1))≦th], thedata y_(j+1) is determined to be data on which similarity calculationwith the retrieval data x_(i) is performed next. On the other hand, if[D(x_(i), y_(j))−d(y_(j), y_(j+1))>th], the data y_(j+1) is eliminatedfrom the target of similarity calculation, because even if a similaritybetween the data y_(j+1) and the retrieval data x_(i) is calculated, itis logically impossible that the calculation result becomes smaller thanor equal to the threshold th. If the data y_(j+1) is eliminated from thetarget of similarity calculation, determination which is the same asthat performed on the data y_(j+1) is repeatedly performed in sequenceon the subsequent pieces of data, whereby data to be used for similaritycalculation with the retrieval data x_(i) is determined.

By using the self similarity table as described above, it is possible toreduce the number of data for which a similarity with the retrieval datax_(i) should be calculated, whereby retrieval can be performed at ahigher speed.

However, it is necessary to perform subtraction of similarities and aprocess of threshold determination in sequence for the respective piecesof data subsequent to the data y_(j) until data to be used forsimilarity calculation is determined, which poses an impediment forfurther speed-up.

An object of the present invention is to provide a data retrieval devicecapable of retrieving data in which a similarity with retrieval data issmaller than or equal to a predetermined threshold at a high speed, fromamong a retrieval target data series.

Means for Solving the Problems

According to an aspect of the present invention, a data retrieval deviceincludes a first skip correspondence table which corresponds to eachpiece of data in a retrieval target data series, and, for each possiblesimilarity range which is taken by a similarity between correspondingdata and retrieval data, records skip destination data information forspecifying the data which appears first after the corresponding dataamong pieces of data in which similarities with the retrieval data havethe possibility to have a predetermined relationship in comparison witha predetermined threshold; and a control unit which, when retrievingdata in which a similarity with the retrieval data is smaller than orequal to the threshold from among the retrieval target data series,selects data in the retrieval target data series for which calculationof a similarity with the retrieval data is necessary, using the firstskip correspondence table.

Effects of the Invention

According to the present invention, data in which a similarity withretrieval data is smaller than or equal to a predetermined threshold canbe retrieved at a high speed, from among a retrieval target data series.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a first embodiment of the presentinvention.

FIG. 2 is a flowchart showing an exemplary process performed by a firstskip correspondence table generation section according to the firstembodiment of the present invention.

FIG. 3 shows an example of a retrieval target data series, and anexemplary configuration of an internal table to be used by the firstskip correspondence table generation section according to the firstembodiment of the present invention.

FIG. 4 shows a specific example of an internal table to be used by thefirst skip correspondence table generation section according to thefirst embodiment of the present invention.

FIG. 5 shows a specific example of the first skip correspondence tableaccording to the first embodiment of the present invention.

FIG. 6 shows another specific example of the first skip correspondencetable according to the first embodiment of the present invention.

FIG. 7 is a flowchart showing an exemplary process performed by acontrol section according to the first embodiment of the presentinvention.

FIG. 8 is a block diagram of a second embodiment of the presentinvention.

FIG. 9 is a flowchart showing an exemplary process performed by a secondskip correspondence table generation section according to the secondembodiment of the present invention.

FIG. 10 shows an example of a retrieval target data series, and anexemplary configuration of an internal table to be used by the secondskip correspondence table generation section according to the secondembodiment of the present invention.

FIG. 11 shows a specific example of an internal table to be used by thesecond skip correspondence table generation section according to thesecond embodiment of the present invention.

FIG. 12 shows a specific example of the second skip correspondence tableaccording to the second embodiment of the present invention.

FIG. 13 shows another specific example of the second skip correspondencetable according to the second embodiment of the present invention.

FIG. 14 is a flowchart showing an exemplary process performed by acontrol section according to the second embodiment of the presentinvention.

FIG. 15 is a block diagram of a third embodiment of the presentinvention.

FIG. 16 is a flowchart showing an exemplary process performed by acontrol section according to the third embodiment of the presentinvention.

FIG. 17 shows an example of a self similarity table.

DESCRIPTION OF EMBODIMENTS First Embodiment

Referring to FIG. 1, a data retrieval device according to a firstembodiment of the present invention includes a similarity calculationsection 110, a control section 120, a first skip correspondence tablegeneration section 130, a retrieval target data series storing section140, and a first skip correspondence table storing section 150.

The retrieval target data series storing section 140 stores one or moreretrieval target data series. One retrieval target data series iscomposed of a plurality of data strings. If the data retrieval device100 is a moving image retrieval device, for example, a retrieval targetdata series corresponds to a time-series signal in which continuousframe images or feature vectors of frame images of moving images arealigned in time order, and one piece of data corresponds to one frameimage or a feature vector thereof. The data retrieval device of thepresent invention is not only applicable to retrieval of moving images,but also applicable to a variety of types of retrieval such as audioretrieval. However, in the below description, it is assumed that theretrieval target data series is a signal in which feature vectors ofcontinuous frame images of moving images are aligned in time order, forthe sake of convenience.

The first skip correspondence table generation section 130 is a meansfor generating a first skip correspondence table of each piece of datain the retrieval target data series stored in the retrieval target dataseries storing section 140. It should be noted that a first skipcorrespondence table of a piece of data means a table containing, foreach of the ranges that similarities between such data and the retrievaldata may take, information for specifying the data which appears firstafter such data, among data in which similarities with the retrievaldata may be present in a range up to a predetermined threshold th orsmaller.

The first skip correspondence table storing section 150 is a means forstoring the first skip correspondence table generated by the first skipcorrespondence table generation section 130. The first skipcorrespondence table is stored in the first skip correspondence tablestoring section 150 in association with data in the retrieval targetdata series, in such a manner that the data corresponding to the tableis clearly distinguishable.

The similarity calculation section 110 is a means for calculating asimilarity between the retrieval data and data in the retrieval targetdata series. The retrieval data may also be a piece of data in datastrings composed of a plurality of strings of data. In the presentembodiment, each piece of data in the retrieval target data series is afeature vector, and the retrieval data is also a feature vector. Thesimilarity calculation section 110 calculates a distance (e.g., Hammingdistance, Euclidean distance, or square of Euclidean distance) betweenthe vectors as a similarity. In this case, as the value of a similarityis closer to 0, the vectors are more similar. In the present invention,any arbitrary scale of similarity may be used, and so it is of coursepossible to calculate a similarity by means of a calculation methodother than those described above.

The control section 120 is a means for controlling the entire dataretrieval device 100. When retrieval data is input from the outside ofthe data retrieval device 100, the control section 120 controls thesimilarity calculation section 110 to calculate a similarity between theretrieval data and data in the retrieval target data series, comparesthe calculation result with a predetermined threshold th, to therebydetermine whether or not such data is data similar to the retrievaldata. If the data is similar to the retrieval data, the control section120 outputs the data as a retrieval result, and repeats the sameprocessing for the next data. In contrast, if the data is not similar tothe retrieval data, the control section 120 determines, according to thesimilarity between the data and the retrieval data and the first skipcorrespondence table of the data, data in the retrieval target dataseries for which a similarity with the retrieval data is calculatednext, and repeats the same processing to the determined data.

Next, operation of the data retrieval device 100 according to thepresent embodiment will be described.

Operation of the data retrieval device 100 is roughly classified intofirst skip correspondence table generating operation which is performedprior to execution of the actual data retrieval operation, and dataretrieval operation using the generated first skip correspondence table.

(1) First Skip Correspondence Table Generating Operation

The first skip correspondence table generation section 130 generates,for each data in the retrieval target data series stored in theretrieval target data series storing section 140, a first skipcorrespondence table of the data, in line with the flow shown in theflowchart of FIG. 2.

First, the first skip correspondence table generation section 130focuses on a piece of data in the retrieval target data series, forgenerating a first skip correspondence table (step S101). In thisdescription, it is assumed that the retrieval target data series is aseries of data (in this example, n-dimensional feature vectors) composedof y_(j), y_(j+1), y_(j+2), y_(j+3), y_(j+4), y_(j+5), and the like, asshown in FIG. 3( a), and that the leading data y_(j) is focused, for thesake of convenience.

Next, the first skip correspondence table generation section 130calculates similarities d(y_(j), y_(j+1)), d(y_(j), y_(j+2)), . . .d(y_(j), y_(j+m)) between the focused data y_(j) and subsequent m piecesof data y_(j+1), y_(j+2), . . . y_(j+m), and stores the calculationresults in an internal table (step S102). The number m of the subsequentdata for calculating similarities is arbitrary. If the number m of thesubsequent data is larger, although there is a possibility of reducing alarger number of data on which similarity calculation with the retrievaldata is performed, the storage capacity required for skip correspondencetables is increased. As such, the value of m is determined in advancewhile considering both.

FIG. 3( b) shows an exemplary internal table to be used in the processof generating the first skip correspondence table by the first skipcorrespondence table generation section 130. The internal table iscomposed of m number of entries at maximum, and each of the entries iscomposed of five items including subsequent data, similarity, minimumvalue, skip possible condition, and continuous skip possible condition.At step S102, the first skip correspondence table generation section 130sets y_(j+1), y_(j+2), . . . y_(j+m) to the item of subsequent data, andsets similarities d(y_(j), y_(i+1)), d(y_(j), y_(i+2)), . . . d(y_(j),y_(j+m)) with the data y_(j) to the item of similarity, in therespective entries of the internal table.

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the first skip correspondence table generation section 130 calculates aminimum value of the similarity with the retrieval data using thesimilarity D(x, y_(j)) between the retrieval data and the data y_(j),and also the similarity between the data y_(j) and the subsequent data,and sets the value to the item of minimum value in the internal table(step S103). For example, in the case of subsequent data y_(j+1), as thesimilarity with the data y_(j) is d(y_(j), y_(j+1)) and the similaritywith the retrieval data is D(x, y_(j)), a minimum value of thesimilarity between the retrieval data and the subsequent data y_(j+1) is[D(x, y_(j))−d(y_(j), y_(j+1))].

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the first skip correspondence table generation section 130 calculates,with use of the minimum value of the similarity with the retrieval dataand a threshold th provided separately, a lower limit of the similarityrange between the data y_(j) and the retrieval data having nopossibility that the similarity with the retrieval data becomes smallerthan or equal to a threshold (having no possibility of being similar tothe retrieval data), and sets the value to the item of skip possiblecondition of the internal table (step S104). For example, in the case ofthe subsequent data y_(j+1), as there is no possibility that the data issimilar to the retrieval data if even the minimum value [D(x,y_(j))−d(y_(j), y_(j+1))] is larger than the threshold th, according toExpression 1, [D(x, y_(j))>th+d(y_(j), y_(j+1))] is set to be a skippossible condition.

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the first skip correspondence table generation section 130 calculates amaximum value of the lower limit of the similarity given by the skippossible conditions of the self data and other subsequent data precedingthe self data, and sets the value to the item of continuous skippossible condition of the internal table (step S105).

Next, according to the continuous skip possible conditions of thesubsequent data y_(j+1), y_(j+2), . . . y_(j+m), the first skipcorrespondence table generation section 130 generates the first skipcorrespondence table of the focused data y_(j), and stores the table inthe first skip correspondence table storing section 150 (step S106).Specifically, from among the lower limits of the similarities given bythe continuous skip possible conditions of the subsequent data y_(j+1),y_(j+2), . . . y_(j+m), the first skip correspondence table generationsection 130 generates a first similarity range in which a lower limithaving the smallest value is set to be a lower limit value and a lowerlimit having the second smallest value is set to be un upper limitvalue, and as skip destination data for the case where the similaritybetween the focused data y_(j) and the retrieval data satisfies thefirst similarity range, sets the last data among the subsequent datahaving the continuous skip possible conditions equivalent to the upperlimit value of the first similarity range. Next, from among the lowerlimits of the similarities given by the continuous skip possibleconditions of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m), thefirst skip correspondence table generation section 130 generates asecond similarity range in which a lower limit having the secondsmallest value is set to be a lower limit value and a lower limit havingthe third smallest value is set to be un upper limit value, and as skipdestination data for the case where the similarity between the focuseddata y_(j) and the retrieval data satisfies the second similarity range,sets the last data among the subsequent data having the continuous skippossible conditions equivalent to the upper limit value of the secondsimilarity range. The first skip correspondence table generation section130 repeats the same processing until a similarity range in which themaximum value among the lower limits of the similarities given by thecontinuous skip possible conditions of the subsequent data yj+1,y_(j+2), . . . y_(j+m) is set to be a lower limit value.

FIG. 4 shows a specific example of the internal table used in theprocess of generating the first skip correspondence table of the datay_(j), and FIG. 5 shows a specific example of the first skipcorrespondence table of the data y_(j). In this example, the thresholdth is 50, and the number m is 13.

In the internal table shown in FIG. 4, the entry of the subsequent datay_(j+3), for example, indicates that the similarity with the data y_(j)is 12, the minimum value of the similarity with the retrieval data is[D(x, y_(j))−12], the skip possible condition is [D(x, y_(j))>62], andthe continuous skip possible condition is [D(x, y_(j))>64]. The groundsthat the continuous skip possible condition of the subsequent datay_(j+3) is not the skip possible conditions of [D(x, y_(j))>62] but[D(x, y_(j))>64] is that the skip possible condition of the datay_(j+2), preceding the self data y_(j+3), is [D(x, y_(j))>64].

Further, in FIG. 5, the first entry in the first skip correspondencetable of the data y_(j) indicates that if the similarity between thedata y_(j) and the retrieval data is larger than 60 and equal to orsmaller than 64, the next data for which a similarity with the retrievaltarget data is calculated is data y_(j+2). The first entry is datagenerated from the continuous skip possible conditions of the subsequentdata y_(j+1) and y_(j+2) in the internal table shown in FIG. 4.

Further, the second entry in the first skip correspondence table of thedata y_(j) shown in FIG. 5, for example, indicates that if thesimilarity between the data y_(j) and the retrieval data is larger than64 and equal to or smaller than 67, the next data for which a similaritywith the retrieval target data is calculated is data y_(j+5). The secondentry is data generated from the continuous skip possible conditions ofthe subsequent data y_(j+2) to y_(j+5) in the internal table shown inFIG. 4.

The first skip correspondence table generation section 130 generatesfirst skip correspondence tables of data other than the data y_(j) inthe retrieval target data series stored in the retrieval target dataseries storing section 140, in accordance with the same procedure asthat for the data y_(j). However, as there is no subsequent data to thelast data in the retrieval target data series, the first skipcorrespondence table is not generated for the last data. Further, it isalso possible not to generate first skip correspondence table of alldata except for the last data, but to generate first skip correspondencetables only for the predetermined partial data. Examples of partial datainclude even number data, odd number data, every p (>2) number data, andthe like.

Further, the first skip correspondence table generation section 130 mayperform a process of combining a plurality of continuous entries in thefirst skip correspondence table generated at step S106 in FIG. 2 intoone entry to thereby reduce the number of entries in the first skipcorrespondence table. An entry formed by combining a plurality ofcontinuous entries has a similarity range in which the minimum value ofthe lower limits of the similarity ranges of the plurality of entriesbefore combination is the lower limit value, and the maximum value ofthe upper limits of the similarity ranges thereof is the upper limitvalue, and has skip destination data which is the leading data among theskip destination data of the plurality of entries before combination.For example, in the first skip correspondence table of FIG. 5, if thefifth and sixth entries are combined in one entry and the seventh andeighth entries are combined in one entry, a first skip correspondencetable shown in FIG. 6 is generated.

As described above, by combining a plurality of entries in the firstskip correspondence table to reduce the number of entries, it ispossible to reduce the storage capacity required for the first skipcorrespondence table at the sacrifice of a maximum range for which skipis possible (skip can actually be performed a little longer).

When combining a plurality of entries in the first skip correspondencetable, the following process may be taken.

If the upper limit of the storage capacity which can be allocated to thefirst skip correspondence table is set, for example, it is possible torepeat reduction of the number of entries by combining the entries untilthe storage capacity for the first skip correspondence table becomessmaller than the upper limit.

Further, when combining a plurality of entries, it is also possible toselect entries to be combined so as to reduce the similarity range to besacrificed by the entry combination (skip can actually be performed alittle longer), for example. Specifically, if the fifth and the sixthentries shown in FIG. 5 are combined into one entry, the similarityrange of the sixth entry, that is, [75<D≦77], is sacrificed (skip canactually be performed a little longer). If the seventh and the eighthentries are combined in one entry, the similarity range of the eighthentry, that is, [80<D≦84], is sacrificed (skip can actually be performeda little longer). When comparing these two cases, as the similarityrange to be sacrificed is smaller in the former case, it is effective tocombine the fifth and the sixth entries. In this case, the number offrames to be sacrificed or the probability to be taken by the similaritymay be considered. By reducing the number of entries by combiningentries in the first skip correspondence table while considering thepossibility of sacrifice as described above, it is possible to maximizethe efficiency of speeding up the retrieval operation provided by thefirst skip correspondence table with respect to the unit storagecapacity of the first skip correspondence table.

(2) Data Retrieval Operation

Upon receiving the retrieval data, the control section 120 retrievesdata similar to the retrieval data from the retrieval target dataseries, along with the flow shown in the flowchart of FIG. 7. If thereare a plurality of retrieval target data series, the same processing isperformed to each retrieval target data series. Data retrieval operationaccording to the present embodiment will be given below for an exemplarycase where one retrieval target data series is focused, and data similarto the retrieval data is retrieved from the data series.

The control section 120 initially sets 1 to a variable j for managingthe order, from the head of the retrieval target data series, of datawhich is subject to processing (step S111), and calculates a similaritybetween the first data and the retrieval data by the similaritycalculation section 110 (step S112).

If the similarity between the first data and the retrieval data issmaller than or equal to the threshold th (YES at step S113), the firstdata is output as similar data (step S114). Then, the control section120 changes the variable j to 2 by adding 1 (step S115), returns to stepS112 via step S119, sets the second data to be the data for which asimilarity is calculated next, and repeats the same processing as thatapplied to the first data.

On the other hand, if the similarity between the first data and theretrieval data is larger than the threshold th (NO at step S113), thecontrol section 120 checks whether or not a first skip correspondencetable of the first data is stored in the storing section 150 (stepS116). If the table is not stored, the control section 120 changes thevariable j to 2 by adding 1 (step S115), returns to step S112 via stepS119, sets the second data to be the data for which a similarity iscalculated next, and repeats the same processing as that applied to thefirst data.

If the first skip correspondence table of the first data is stored, thecontrol section 120 checks whether or not the first skip correspondencetable includes a similarity range including the similarity between thefirst data and the retrieval data (step S117). If the table does notinclude the range, the control section 120 changes the variable j to 2by adding 1 (step S115), returns to step S112 via step S119, sets thesecond data to be the next data for which a similarity is calculated,and repeats the same processing as that applied to the first data.

If the first skip correspondence table of the first data includes asimilarity range including the similarity between the first data and theretrieval data, the control section 120 sets the skip destination data,which is recorded corresponding to the similarity range, as the data forwhich a similarity is calculated next (that is, changing the variable jso as to indicate the skip destination data) (step S118), returns tostep S112 via step S119, and repeats the same processing as that appliedto the first data for the skip destination data.

At step S119, the control section 120 determines whether the changedvalue of the variable j exceeds a maximum value j_(max) of the number ofdata of the retrieval target data series, and if the value does notexceed the maximum value, returns to step S112, while if the valueexceeds the maximum value, ends the retrieval process with respect tothe retrieval target data series.

As described above, according to the present embodiment, data in which asimilarity with the retrieval data is smaller than or equal to apredetermined threshold can be retrieved at a high speed from theretrieval target data series. This is because if a similarity betweendata in the retrieval target data series and the retrieval data islarger than the threshold, data for which similarity calculation is notnecessary can be skipped by referring to the first skip correspondencetable of such data.

For example, if the similarity between the data y_(j) in the retrievaltarget data series and the retrieval data is 72, the skip destinationdata is j+7, according to the first skip correspondence table of thedata y_(j) shown in FIG. 5 or 6. As such, as similarity calculation withthe retrieval data is not performed on 6 pieces of data y_(j+1),y_(j+2), y₊₃, y_(j+4), y_(j+5), and y_(j+6) in the retrieval target dataseries, the retrieval time is reduced for those data. Further, as it isnot necessary to determine necessity of performing similaritycalculation with the retrieval data regarding the respective datay_(j+1), y_(j+2), y_(j+3), y_(j+4), y_(j+5), and y_(j+6), the retrievaltime can be further reduced for such calculation.

It should be noted that although the threshold th is fixed to one valuein the present embodiment, the present invention is applicable to a dataretrieval device in which a plurality of thresholds th are used. In thatcase, a first skip correspondence table is generated and storedbeforehand for each of the thresholds th. For example, if there arethree values of thresholds th such as 50, 60, and 70, a first skipcorrespondence table for th=50, a first skip correspondence table forth=60, and a first skip correspondence table for th=70 may be generatedand stored.

Second Embodiment

Referring to FIG. 8, a data retrieval device 200 according to a secondembodiment of the present invention differs from the data retrievaldevice 100 according to the first embodiment in that a control section220, a second skip correspondence table generation section 230, and asecond skip correspondence table storing section 250 are included,instead of the control section 120, the first skip correspondence tablegeneration section 130, and the first skip correspondence table storingsection 150.

The second skip correspondence table generation section 230 is a meansfor generating a second skip correspondence table of each piece of datain the retrieval target data series stored in the retrieval target dataseries storing section 140. It should be noted that a second skipcorrespondence table of a piece of data means a table containing, foreach of the ranges that similarities between such data and the retrievaldata may take, information for specifying the data which appears firstafter such data, among data in which similarities with the retrievaldata may be larger than a predetermined threshold th.

The second skip correspondence table storing section 250 is a means forstoring the second skip correspondence table generated by the secondskip correspondence table generation section 230. The second skipcorrespondence table is stored in the second skip correspondence tablestoring section 250 in association with data in the retrieval targetdata series, in such a manner that the data corresponding to the tableis clearly distinguishable.

The control section 220 is a means for controlling the entire dataretrieval device 200. When retrieval data is input from the outside ofthe data retrieval device 200, the control section 220 controls thesimilarity calculation section 110 to calculate a similarity between theretrieval data and data in the retrieval target data series, comparesthe calculation result with a predetermined threshold th, to therebydetermine whether or not such data is data similar to the retrievaldata. If such data is similar to the retrieval data, the control section220 outputs the data as a retrieval result, and determines data in theretrieval target data series for which a similarity with the retrievaldata is calculated next, according to the similarity between such dataand the retrieval data and the second skip correspondence table of suchdata. If the determined data is not the next data of such data, thecontrol section 220 outputs data ranging from the next data of such datato data immediately preceding the determined data as similar data, andrepeats the same processing to the determined data. In contrast, if suchdata is not similar to the retrieval data, the control section 120repeats the same processing to the next data of such data.

Next, operation of the data retrieval device 200 according to thepresent embodiment will be described.

Operation of the data retrieval device 200 is roughly classified intosecond skip correspondence table generating operation which is performedprior to execution of the actual data retrieval operation, and dataretrieval operation using the generated second skip correspondencetable.

(1) Second Skip Correspondence Table Generating Operation

The second skip correspondence generation section 230 generates, foreach data in the retrieval target data series stored in the retrievaltarget data series storing section 140, a second skip correspondencetable of the data, in line with the flow shown in the flowchart of FIG.9.

First, the second skip correspondence table generation section 230focuses on a piece of data in the retrieval target data series, forgenerating a second skip correspondence table (step S201). In thisdescription, it is assumed that the retrieval target data series is aseries of data (in this example, n-dimensional feature vectors) composedof y_(j), y_(j+1), y_(j+2), y_(j+3), y_(j+4), y_(j+5), and the like, asshown in FIG. 10( a), and that the leading data y_(h) is focused, forthe sake of convenience.

Next, the second skip correspondence generation section 230 calculatessimilarities d(y_(j), y_(j+1)), d(y_(j), y_(j+2)), . . . d(y_(j),y_(j+m)) between the focused data y_(j) and subsequent m pieces of datay_(j+1), y_(j+2), . . . y_(j+m), and stores the calculation results inan internal table (step S202). The number m of the subsequent data forcalculating similarities is arbitrary. If the number m of the subsequentdata is larger, although there is a possibility of reducing a largernumber of data on which similarity calculation with the retrieval datais performed, the storage capacity required for skip correspondencetables is increased. As such, the value of m is determined in advancewhile considering both.

FIG. 10( b) shows an exemplary internal table to be used in the processof generating the second skip correspondence table by the second skipcorrespondence table generation section 230. The internal table iscomposed of m number of entries at maximum, and each of the entries iscomposed of five items including subsequent data, similarity, maximumvalue, skip possible condition, and continuous skip possible condition.At step S202, the second skip correspondence table generation section230 sets y_(j+1), y_(j+2), . . . y_(j+m) to the item of subsequent data,and sets similarities d(y_(j), y_(j+1), d(y_(j), y_(j+2)), . . .d(y_(j), y_(j+m)) with the data y_(j) to the item of similarity, in therespective entries of the internal table.

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the second skip correspondence table generation section 230 calculates amaximum value of the similarity with the retrieval data using thesimilarity D(x, y_(j)) between the retrieval data and the data y_(j),and also the similarity between the data y_(j) and the subsequent data,and sets the value to the item of maximum value in the internal table(step S203). For example, in the case of subsequent data y_(j+1), as thesimilarity with the data y_(j) is d(y_(j), y_(j+1)) and the similaritybetween the data y_(j) and the retrieval data is D(x, y_(j)), a maximumvalue of the similarity between the retrieval data and the subsequentdata y_(j+1) is [D(x, y_(j))+d(y_(j), y_(j+1))].

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the second skip correspondence table generation section 230 calculates,with use of the maximum value of the similarity with the retrieval dataand a threshold th provided separately, an upper limit of the similarityrange between the data y_(j) and the retrieval data having nopossibility that the similarity with the retrieval data becomes largerthan the threshold (having no possibility of being not similar to theretrieval data), and sets the value to the item of skip possiblecondition of the internal table (step S204). For example, in the case ofthe subsequent data y_(j+1), as there is no possibility that the data issimilar to the retrieval data if even the maximum value [D(x,y_(j))+d(y_(j), y_(j+1))] is smaller than or equal to the threshold th,[D(x, y_(j))≦th−d(y_(j), y_(j+1))] is set to be a skip possiblecondition.

Next, for each of the subsequent data y_(j+1), y_(j+2), . . . y_(j+m),the second skip correspondence table generation section 230 calculates aminimum value of the upper limit of the similarity given by the skippossible conditions of the self data and other subsequent data precedingthe self data, and sets the value to the item of continuous skippossible condition of the internal table (step S205).

Next, according to the continuous skip possible conditions of thesubsequent data y_(j+1), y_(j+2), . . . y_(j+m), the second skipcorrespondence table generation section 230 generates the second skipcorrespondence table of the focused data y_(j), and stores the table inthe second skip correspondence table storing section 250 (step S206).Specifically, from among the upper limits of the similarities given bythe continuous skip possible conditions of the subsequent data y_(j+1),y_(j+2), . . . y_(j+m), the second skip correspondence table generationsection 230 generates a first similarity range in which an upper limithaving the largest value is set to be an upper limit value and an upperlimit having the second largest value is set to be a lower limit value,and as skip destination data for the case where the similarity betweenthe focused data y_(j) and the retrieval data satisfies the firstsimilarity range, sets the last data among the subsequent data havingthe continuous skip possible conditions equivalent to the lower limitvalue of the first similarity range. Next, from among the upper limitsof the similarities given by the continuous skip possible conditions ofthe subsequent data y_(j+1), y_(j+2), . . . y_(j+m), the second skipcorrespondence table generation section 230 generates a secondsimilarity range in which an upper limit having the second largest valueis set to be an upper limit value and an upper limit having the thirdlargest value is set to be a lower limit value, and as skip destinationdata for the case where the similarity between the focused data y_(j)and the retrieval data satisfies the second similarity range, sets thelast data among the subsequent data having the continuous skip possibleconditions equivalent to the lower limit value of the second similarityrange. The second skip correspondence table generation section 130repeats the same processing until a similarity range in which theminimum value among the upper limits of the similarities given by thecontinuous skip possible conditions of the subsequent data yj+1,y_(j+2), . . . y_(j+m) is set to be a lower limit value.

FIG. 11 shows a specific example of the internal table used in theprocess of generating the first skip correspondence table of the datay_(j), and FIG. 12 shows a specific example of the second skipcorrespondence table of the data y_(j). In this example, the thresholdth is 50, and the number m is 13.

In the internal table shown in FIG. 11, the entry of the subsequent datay_(j+3), for example, indicates that the similarity with the data y_(j)is 12, the maximum value of the similarity with the retrieval data is[D(x, y_(j))+12], the skip possible condition is [D(x, y_(j))≦38], andthe continuous skip possible condition is [D(x, y_(j))≦36]. The groundsthat the continuous skip possible condition of the subsequent datay_(j+3) is not the skip possible conditions of [D(x, y_(j))≦38] but[D(x, y_(j))≦36] is that the skip possible condition of the datay_(j+2), preceding the self data y_(j+3), is [D(x, y_(j))≦36].

Further, in FIG. 12, the first entry in the second skip correspondencetable of the data y_(j) indicates that if the similarity between thedata y_(j) and the retrieval data is larger than 36 and equal to orsmaller than 40, the next data for which a similarity with the retrievaltarget data is calculated is data y_(j+2). The first entry is datagenerated from the continuous skip possible conditions of the subsequentdata y_(j+1) and y_(j+2) in the internal table shown in FIG. 11.

Further, the second entry in the first skip correspondence table of thedata y_(j) shown in FIG. 12, for example, indicates that if thesimilarity between the data y_(j) and the retrieval data is larger than33 and equal to or smaller than 36, the next data for which a similaritywith the retrieval target data is calculated is data y_(j+5). The secondentry is data generated from the continuous skip possible conditions ofthe subsequent data y_(j+2) to y_(j+5) in the internal table shown inFIG. 11.

The second skip correspondence table generation section 230 generatessecond skip correspondence tables of data other than the data y_(j) inthe retrieval target data series stored in the retrieval target dataseries storing section 140, in accordance with the same procedure asthat for the data y_(j). However, as there is no subsequent data to thelast data in the retrieval target data series, the second skipcorrespondence table is not generated for the last data. Further, it isalso possible not to generate second skip correspondence table of alldata except for the last data, but to generate second skipcorrespondence tables only for the predetermined partial data. Examplesof partial data include even number data, odd number data, every p (>2)number data, and the like.

Further, the second skip correspondence table generation section 230 mayperform a process of combining a plurality of continuous entries in thesecond skip correspondence table generated at step S206 in FIG. 9 intoone entry to thereby reduce the number of entries in the second skipcorrespondence table. An entry formed by combining a plurality ofcontinuous entries has a similarity range in which the minimum value ofthe lower limits of the similarity ranges of the plurality of entriesbefore combination is the lower limit value, and the maximum value ofthe upper limits of the similarity ranges thereof is the upper limitvalue, and has skip destination data which is the leading data among theskip destination data of the plurality of entries before combination.For example, in the second skip correspondence table of FIG. 12, if thefifth and sixth entries are combined in one entry and the seventh andeighth entries are combined in one entry, a first skip correspondencetable shown in FIG. 13 is generated.

As described above, by combining a plurality of entries in the secondskip correspondence table to reduce the number of entries, it ispossible to reduce the storage capacity required for the second skipcorrespondence table at the sacrifice of a maximum range for which skipis possible (skip can actually be performed a little longer).

When combining a plurality of entries in the second skip correspondencetable, the following process may be taken.

If the upper limit of the storage capacity which can be allocated to thesecond skip correspondence table has been set, for example, it ispossible to repeat reduction of the number of entries by combining theentries until the storage capacity for the second skip correspondencetable becomes smaller than the upper limit.

Further, when combining a plurality of entries, it is also possible toselect entries to be combined so as to reduce the similarity range to besacrificed (skip can actually be performed a little longer) by the entrycombination, for example. Specifically, if the fifth and the sixthentries shown in FIG. 12 are combined into one entry, the similarityrange of the sixth entry, that is, [23<D≦25], is sacrificed (skip canactually be performed a little longer). If the seventh and the eighthentries are combined in one entry, the similarity range of the eighthentry, that is, [16<D≦20], is sacrificed (skip can actually be performeda little longer). When comparing these two cases, as the similarityrange to be sacrificed is smaller in the former case, it is effective tocombine the fifth and the sixth entries. In this case, the number offrames to be sacrificed or the probability to be taken by the similaritymay be considered. By reducing the number of entries by combiningentries in the second skip correspondence table while considering thepossibility of sacrifice as described above, it is possible to maximizethe efficiency of speeding up the retrieval operation provided by thesecond skip correspondence table with respect to the unit storagecapacity of the second skip correspondence table.

(2) Data Retrieval Operation

Upon receiving the retrieval data, the control section 220 retrievesdata similar to the retrieval data from the retrieval target dataseries, along with the flow shown in the flowchart of FIG. 14. If thereare a plurality of retrieval target data series, the same processing isperformed to each retrieval target data series. Data retrieval operationaccording to the present embodiment will be given below for an exemplarycase where one retrieval target data series is focused, and data similarto the retrieval data is retrieved from the data series.

The control section 220 initially sets 1 to a variable j for managingthe order, from the head of the retrieval target data series, of datawhich is subject to processing (step S211), and calculates a similaritybetween the first data and the retrieval data by the similaritycalculation section 110 (step S212).

If the similarity between the first data and the retrieval data issmaller than or equal to the threshold th (YES at step S213), the firstdata is output as similar data (step S215). Then, the control section220 checks whether or not the second skip correspondence table of thefirst data is stored in the storing section 250 (step S216). If thetable is not stored, the control section 220 changes the variable j to 2by adding 1 (step S214), returns to step S212 via step S221, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the second skip correspondence table of the first data is stored, thecontrol section 220 checks whether or not the second skip correspondencetable includes a similarity range including the similarity between thefirst data and the retrieval data (step S217). If the table does notinclude the range, the control section 220 changes the variable j to 2by adding 1 (step S214), returns to step S212 via step S221, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the second skip correspondence table of the first data includes asimilarity range including the similarity between the first data and theretrieval data, the control section 220 determines whether or not theskip destination data, which is recorded corresponding to the similarityrange, is the next data of the data currently being processed (stepS218). If the skip destination data is not the next data of the datacurrently being processed (which means if some pieces of data areskipped), the control section 220 outputs data ranging from the nextdata of the data currently being processed to the data immediatelypreceding the skip destination data, as similar data (step S219). Then,the control section 220 sets the skip destination data to be data forwhich a similarity is calculated next (that is, changing the variable jso as to indicate the skip destination data) (step S220), returns tostep S212 via step S221, and repeats the same processing as that appliedto the first data for the skip destination data. Alternatively, if theskip destination data is the next data of the data currently beingprocessed, the control section 220 does not perform step S219, and setsthe skip destination data to be data for which a similarity iscalculated next (that is, changing the variable j so as to indicate theskip destination data) (step S220), returns to step S212 via step S221,and repeats the same processing as that applied to the first data forthe skip destination data.

On the other hand, if the similarity between the first data and theretrieval data is larger than the threshold th (NO at step S213), thecontrol section 220 changes the variable j to 2 by adding 1 (step S214),returns to step S212 via step S221, sets the second data to be data forwhich a similarity is calculated next, and repeats the same processingas that applied to the first data.

At step S221, the control section 220 determines whether the changedvalue of the variable j exceeds a maximum value j_(max) of the number ofdata of the retrieval target data series, and if the value does notexceed the maximum value, returns to step S112, while if the valueexceeds the maximum value, ends the retrieval process with respect tothe retrieval target data series.

As described above, according to the present embodiment, data in which asimilarity with the retrieval data is smaller than or equal to apredetermined threshold can be retrieved at a high speed from theretrieval target data series. This is because if a similarity betweendata in the retrieval target data series and the retrieval data becomessmaller than or equal to the threshold, data for which similaritycalculation is not necessary can be skipped by referring to the secondskip correspondence table of such data.

For example, if the similarity between the data y_(j) in the retrievaltarget data series and the retrieval data is 28, the skip destinationdata is j+7, according to the second skip correspondence table of thedata y_(j) shown in FIG. 12 or 13. As such, as similarity calculationwith the retrieval data is not performed on 6 pieces of data y_(j+1),y_(j+2), y_(j+3), y_(j+4), y_(j+5), and y_(j+6) in the retrieval targetdata series, the retrieval time is reduced for those data. Further, asit is not necessary to determine necessity of performing similaritycalculation with the retrieval data regarding the respective datay_(y+1), y_(j+2), y_(j+3), y_(j+4), yj+5, and y_(j+6), the retrievaltime can be further reduced for such calculation.

It should be noted that although the threshold th is fixed to one valuein the present embodiment, the present invention is applicable to a dataretrieval device in which a plurality of thresholds th are used. In thatcase, a second skip correspondence table is generated and storedbeforehand for each of the thresholds th. For example, if there arethree values of thresholds th such as 50, 60, and 70, a second skipcorrespondence table for th=50, a second skip correspondence table forth=60, and a second skip correspondence table for th=70 may be generatedand stored.

Third Embodiment

Referring to FIG. 15, a data retrieval device 300 according to a thirdembodiment of the present invention differs from the data retrievaldevice 100 according to the first embodiment in that a second skipcorrespondence table generation section 230 and a second skipcorrespondence table storing section 250 are added, and also a controlsection 320 is included instead of the control section 120.

The second skip correspondence table generation section 230 iscompletely the same as the second skip correspondence table generationsection 230 according to the second embodiment, which is a means forgenerating a second skip correspondence table of each piece of data inthe retrieval target data series stored in the retrieval target dataseries storing section 140. Further, the second skip correspondencetable storing section 250 is completely the same as the second skipcorrespondence table storing section 250 according to the secondembodiment, which is a means for storing the second skip correspondencetable generated by the second skip correspondence table generationsection 230.

The control section 320 is a means for controlling the entire dataretrieval device 200. When retrieval data is input from the outside ofthe data retrieval device 300, the control section 320 controls thesimilarity calculation section 110 to calculate a similarity between theretrieval data and data in the retrieval target data series, comparesthe calculation result with a predetermined threshold th, to therebydetermine whether or not such data is data similar to the retrievaldata.

If the data is not similar to the retrieval data, the control section320 determines, according to the similarity between the data and theretrieval data and the first skip correspondence table of the data, datain the retrieval target data series for which a similarity with theretrieval data is calculated next, and repeats the same processing tothe determined data.

In contrast, if the data is similar to the retrieval data, the controlsection 320 outputs the data as a retrieval result, and determines datain the retrieval target data series for which a similarity with theretrieval data is calculated next, according to the similarity betweensuch data and the retrieval data and the second skip correspondencetable of such data. If the determined data is not the next data of suchdata, the control section 320 outputs data ranging from the next data ofsuch data to data immediately preceding the determined data as similardata, and repeats the same processing to the determined data.

Next, operation of the data retrieval device 300 according to thepresent embodiment will be described.

Operation of the data retrieval device 300 is roughly classified intofirst and second skip correspondence table generating operation which isperformed prior to execution of the actual data retrieval operation, anddata retrieval operation using the generated first and second skipcorrespondence tables.

(1) First and Second Skip Correspondence Table Generating Operation

As the operation, by the first skip correspondence table generationsection 130, of generating a first skip correspondence table of eachpiece of data in the retrieval target data series stored in theretrieval target data series storing section 140 is the same as thatperformed by the first skip correspondence table generation section 130according to the first embodiment, and the detailed operation thereofhas been described above, the description is omitted.

As the operation, by the second skip correspondence table generationsection 230, of generating a second skip correspondence table of eachpiece of data in the retrieval target data series stored in theretrieval target data series storing section 140 is the same as thatperformed by the second skip correspondence table generation section 230according to the second embodiment, and the detailed operation thereofhas been described above, the description is omitted.

(2) Data Retrieval Operation

Upon receiving the retrieval data, the control section 320 retrievesdata similar to the retrieval data from the retrieval target dataseries, along with the flow shown in the flowchart of FIG. 16. If thereare a plurality of retrieval target data series, the same processing isperformed to each retrieval target data series. Data retrieval operationaccording to the present embodiment will be given below for an exemplarycase where one retrieval target data series is focused, and data similarto the retrieval data is retrieved from the data series.

The control section 320 initially sets 1 to a variable j for managingthe order, from the head of the retrieval target data series, of datawhich is subject to processing (step S311), and calculates a similaritybetween the first data and the retrieval data by the similaritycalculation section 110 (step S312).

If the similarity between the first data and the retrieval data islarger than the threshold th (NO at step S313), the control section 320checks whether or not the first skip correspondence table of the firstdata is stored in the storing section 150 (step S314). If the table isnot stored, the control section 320 changes the variable j to 2 byadding 1 (step S317), returns to step S312 via step S324, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the first skip correspondence table of the first data is stored, thecontrol section 320 checks whether or not the first skip correspondencetable includes a similarity range including the similarity between thefirst data and the retrieval data (step S315). If the table does notinclude the range, the control section 320 changes the variable j to 2by adding 1 (step S317), returns to step S312 via step S324, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the first skip correspondence table of the first data includes asimilarity range including the similarity between the first data and theretrieval data, the control section 320 sets the skip destination data,which is recorded corresponding to the similarity range, as data forwhich a similarity is calculated next (that is, changing the variable jso as to indicate the skip destination data) (step S316), returns tostep S312 via step S324, and repeats the same processing as that appliedto the first data for the skip destination data.

If the similarity between the first data and the retrieval data issmaller than or equal to the threshold th (YES at step S313), the firstdata is output as similar data (step S318). Then, the control section320 checks whether or not the second skip correspondence table of thefirst data is stored in the storing section 250 (step S319). If thetable is not stored, the control section 320 changes the variable j to 2by adding 1 (step S317), returns to step S312 via step S324, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the second skip correspondence table of the first data is stored, thecontrol section 320 checks whether or not the second skip correspondencetable includes a similarity range including the similarity between thefirst data and the retrieval data (step S320). If the table does notinclude the range, the control section 320 changes the variable j to 2by adding 1 (step S317), returns to step S312 via step S324, sets thesecond data to be data for which a similarity is calculated next, andrepeats the same processing as that applied to the first data.

If the second skip correspondence table of the first data includes asimilarity range including the similarity between the first data and theretrieval data, the control section 320 determines whether or not theskip destination data, which is recorded corresponding to the similarityrange, is the next data of the data currently being processed (stepS321). If the skip destination data is not the next data of the datacurrently being processed (which means if some pieces of data areskipped), the control section 320 outputs data ranging from the nextdata of the data currently being processed to the data immediatelypreceding the skip destination data, as similar data (step S322). Then,the control section 320 sets the skip destination data to be data forwhich a similarity is calculated next (that is, changing the variable jso as to indicate the skip destination data) (step S323), returns tostep S312 via step S324, and repeats the same processing as that appliedto the first data for the skip destination data. Alternatively, if theskip destination data is the next data of the data currently beingprocessed, the control section 320 does not perform step S322, and setsthe skip destination data to be data for which a similarity iscalculated next (that is, changing the variable j so as to indicate theskip destination data) (step S323), returns to step S312 via step S324,and repeats the same processing as that applied to the first data forthe skip destination data.

At step S324, the control section 320 determines whether the changedvalue of the variable j exceeds a maximum value j_(max) of the number ofdata of the retrieval target data series, and if the value does notexceed the maximum value, returns to step S312, while if the valueexceeds the maximum value, ends the retrieval process with respect tothe retrieval target data series.

As described above, according to the present embodiment, data in which asimilarity with the retrieval data is smaller than or equal to apredetermined threshold can be retrieved at a high speed from among theretrieval target data series.

A first reason is that when a similarity between data in the retrievaltarget data series and the retrieval data is larger than the threshold,data for which similarity calculation is not necessary can be skipped byreferring to the first skip correspondence table of such data.

For example, if the similarity between the data y_(j) in the retrievaltarget data series and the retrieval data is 72, the skip destinationdata is j+7, according to the first skip correspondence table of thedata y_(j) shown in FIG. 5 or 6. As such, as similarity calculation withthe retrieval data is not performed on 6 pieces of data y_(j+1),y_(j+2), y_(j+3), y_(j+4), y_(j+5), and y_(j+6) in the retrieval targetdata series, the retrieval time is reduced for those data. Further, asit is not necessary to determine necessity of performing similaritycalculation with the retrieval data regarding the respective datay_(j+1), y_(j+2), y_(j+3), y_(j+4), y_(j+5), and y_(j+6), the retrievaltime can be further reduced for such calculation.

A second reason is that when a similarity between data in the retrievaltarget data series and the retrieval data is smaller than or equal tothe threshold, data for which similarity calculation is not necessarycan be skipped by referring to the second skip correspondence table ofsuch data.

For example, if the similarity between the data y_(j) in the retrievaltarget data series and the retrieval data is 28, the skip destinationdata is j+7, according to the second skip correspondence table of thedata y_(j) shown in FIG. 12 or 13. As such, as similarity calculationwith the retrieval data is not performed on 6 pieces of data y_(j+1),y_(j+2), y_(j+3), y_(j+4), y_(j+5), and y_(j+6) in the retrieval targetdata series, the retrieval time is reduced for those data. Further, asit is not necessary to determine necessity of performing similaritycalculation with the retrieval data regarding the respective datay_(j+1), y_(j+2), y_(j+3), y_(j+5), and y_(j+6), the retrieval time canbe further reduced for such calculation.

It should be noted that although the threshold th is fixed to one valuein the present embodiment, the present invention is applicable to a dataretrieval device in which a plurality of thresholds th are used. In thatcase, first and second skip correspondence tables are generated andstored beforehand for each of the thresholds th. For example, if thereare three values of thresholds th such as 50, 60, and 70, first andsecond skip correspondence tables for th=50, first and second skipcorrespondence tables for th=60, and first and second skipcorrespondence tables for th=70 may be generated and stored.

While the embodiments of the present invention have been describedabove, the present invention is not limited to these examples. It willbe understood by those of ordinary skill in the art that various changesin form and details may be made therein without departing from the scopeof the present invention. Further, the data retrieval device of thepresent invention is adapted such that the functions thereof can berealized by computers and programs, as well as hardware. Such a programis provided in the form of being written on a computer readablerecording medium such as a magnetic disk, a semiconductor memory, or thelike, is read by a computer when the computer is started for example,and controls operation of the computer, to thereby allow the computer tofunction as the similarity calculation section, the control section, thefirst skip correspondence table generation section, the second skipcorrespondence table generation section, and the like of theabove-described embodiments.

This application is based upon and claims the benefit of priority fromJapanese patent application No. 2009-12811, filed on Jan. 23, 2009, thedisclosure of which is incorporated herein in its entirety by reference.

REFERENCE NUMERALS

-   100, 200, 300 data retrieval device-   110 similarity calculation section-   120, 220, 320 control section-   130 first skip correspondence table generation section-   140 retrieval target data series storing section-   150 first skip correspondence table storing section-   230 second skip correspondence table generation section

1. A data retrieval device, comprising: a first skip correspondencetable that corresponds to each piece of data in a retrieval target dataseries, and, for each possible similarity range which is taken by asimilarity between corresponding data and retrieval data, records skipdestination data information for specifying the data which appears firstafter the corresponding data among pieces of data in which similaritieswith the retrieval data have the possibility to have a predeterminedrelationship in comparison with a predetermined threshold; and a controlunit that, when retrieving data in which a similarity with the retrievaldata is smaller than or equal to the threshold from among the retrievaltarget data series, selects data in the retrieval target data series forwhich calculation of a similarity with the retrieval data is necessary,using the first skip correspondence table.
 2. The data retrieval device,according to claim 1, wherein the predetermined relationship is arelationship in which a similarity with the retrieval data is smallerthan or equal to the threshold.
 3. The data retrieval device, accordingto claim 2, wherein if a similarity with the retrieval data calculatedfor a piece of data in the retrieval target data series is not smallerthan or equal to the threshold, the control unit determines data in theretrieval target data series for which a similarity with the retrievaldata is calculated next, according to the calculated similarity and thefirst skip correspondence table of the piece of data.
 4. The dataretrieval device, according to claim 3, wherein if a similarity rangeincluding the similarity between the piece of data and the retrievaldata is present in the first skip correspondence table, the control unitdetermines data indicated by skip destination data information, which isrecorded corresponding to the similarity range present in the first skipcorrespondence table, to be data in the retrieval target data series forwhich a similarity with the retrieval data is calculated next.
 5. Thedata retrieval device, according to claim 2, further comprising a firstskip correspondence table generation unit that receives the retrievaltarget data series and generates the first skip correspondence table ofeach data in the retrieval target data series.
 6. The data retrievaldevice, according to claim 5, wherein the first skip correspondencetable generation unit calculates a similarity between subsequent data ofgeneration target data of the first skip correspondence table and thegeneration target data, obtains, from the similarity and the threshold,a skip possible condition indicating a lower limit of a similaritybetween the generation target data and the retrieval data having nopossibility that a similarity between the subsequent data and theretrieval data becomes smaller than or equal to the threshold,calculates a continuous skip possible condition indicating a maximumvalue of the lower limit of the similarity provided by the skip possibleconditions of the self data and subsequent data preceding the self data,and according to the continuous skip possible condition calculated,generates a first skip correspondence table of the generation targetdata.
 7. The data retrieval device, according to claim 1, wherein thepredetermined relationship is a relationship in which a similarity withthe retrieval data is larger than the threshold.
 8. The data retrievaldevice, according to claim 7, wherein if a similarity with the retrievaldata calculated for a piece of data in the retrieval target data seriesis smaller than or equal to the threshold, the control unit determinesdata in the retrieval target data series for which a similarity with thetarget data is calculated next, according to the calculated similarityand the first skip correspondence table of the piece of data.
 9. Thedata retrieval device, according to claim 8, wherein if a similarityrange including the similarity between the piece of data and theretrieval data is present in the first skip correspondence table, thecontrol unit determines data indicated by skip destination datainformation, which is recorded corresponding to the similarity rangepresent in the first skip correspondence table, to be data in theretrieval target data series for which a similarity with the retrievaldata is calculated next.
 10. The data retrieval device, according toclaim 7, further comprising a first skip correspondence table generationunit that receives the retrieval target data series and generates thefirst skip correspondence table of each data in the retrieval targetdata series.
 11. The data retrieval device, according to claim 10,wherein the first skip correspondence table generation unit calculates asimilarity between subsequent data of generation target data of thefirst skip correspondence table and the generation target data, obtains,from the similarity and the threshold, a skip possible conditionindicating an upper limit of a similarity between the generation targetdata and the retrieval data having no possibility that a similaritybetween the subsequent data and the retrieval data becomes larger thanthe threshold, calculates a continuous skip possible conditionindicating a minimum value of the upper limit of the similarity providedby the skip possible conditions of the self data and subsequent datapreceding the self data, and according to the continuous skip possiblecondition calculated, generates a first skip correspondence table of thegeneration target data.
 12. The data retrieval device, according toclaim 6, wherein the first skip correspondence table generation unitcombines a plurality of continuous similarity ranges in the generatedfirst skip correspondence table into one similarity range, and assignsmost preceding data of a plurality of skip destination datacorresponding to the similarity ranges before combination, as skipdestination data corresponding to the combined similarity range.
 13. Thedata retrieval device, according to claim 4, wherein the first skipcorrespondence table generation unit generates the first skipcorrespondence table only for partial data of the retrieval target dataseries.
 14. The data retrieval device, according to claim 2, furthercomprising a second skip correspondence table that corresponds to eachpiece of data in a retrieval target data series, and, for each possiblesimilarity range which is taken by a similarity between correspondingdata and retrieval data, records skip destination data information forspecifying the data which appears first after the corresponding dataamong pieces of data in which similarities with the retrieval data havethe possibility not to have the predetermined relationship with thethreshold, wherein the control unit selects data in the retrieval targetdata series for which calculation of a similarity with the retrievaldata is necessary, using the first skip correspondence table and thesecond skip correspondence table.
 15. The data retrieval device,according to claim 14, wherein if a similarity with the retrieval datacalculated for a piece of data in the retrieval target data series isnot smaller than or equal to the threshold, the control unit determinesdata in the retrieval target data series for which a similarity with theretrieval data is calculated next, according to the calculatedsimilarity and the first skip correspondence table of the piece of data,and if a similarity with the retrieval data calculated for a piece ofdata in the retrieval target data series is smaller than or equal to thethreshold, the control unit determines data in the retrieval target dataseries for which a similarity with the target data is calculated next,according to the calculated similarity and the second skipcorrespondence table of the piece of data.
 16. The data retrievaldevice, according to claim 15, wherein if a similarity range includingthe similarity between the piece of data and the retrieval data ispresent in the first skip correspondence table, the control unitdetermines data indicated by skip destination data information, which isrecorded corresponding to the similarity range present in the first skipcorrespondence table, to be data in the retrieval target data series forwhich a similarity with the retrieval data is calculated next, and ifthe similarity range including the similarity between the piece of dataand the retrieval data is present in the second skip correspondencetable, the control unit determines data indicated by skip destinationdata information, which is recorded corresponding to the similarityrange present in the first skip correspondence table, to be data in theretrieval target data series for which a similarity with the retrievaldata is calculated next.
 17. The data retrieval device, according toclaim 14, further comprising: a first skip correspondence tablegeneration unit that receives the retrieval target data series andgenerates the first skip correspondence table of each data in theretrieval target data series, and a second skip correspondence tablegeneration unit that receives the retrieval target data series andgenerates the second skip correspondence table of each data in theretrieval target data series.
 18. The data retrieval device, accordingto claim 17, wherein the first skip correspondence table generation unitcalculates a similarity between subsequent data of generation targetdata of the first skip correspondence table and the generation targetdata, obtains, from the similarity and the threshold, a skip possiblecondition indicating a lower limit of a similarity between thegeneration target data and the retrieval data having no possibility thata similarity between the subsequent data and the retrieval data becomessmaller than or equal to the threshold, calculates a continuous skippossible condition indicating a maximum value of the lower limit of thesimilarity provided by the skip possible conditions of the self data andsubsequent data preceding the self data, and according to the continuousskip possible condition calculated, generates a first skipcorrespondence table of the generation target data, and the second skipcorrespondence table generation unit calculates a similarity betweensubsequent data of generation target data of the second skipcorrespondence table and the generation target data, obtains, from thesimilarity and the threshold, a skip possible condition indicating anupper limit of a similarity between the generation target data and theretrieval data having no possibility that a similarity between thesubsequent data and the retrieval data becomes larger than thethreshold, calculates a continuous skip possible condition indicating aminimum value of the upper limit of the similarity provided by the skippossible conditions of the self data and subsequent data preceding theself data, and according to the continuous skip possible conditioncalculated, generates a second skip correspondence table of thegeneration target data.
 19. The data retrieval device, according toclaim 18, wherein the first skip correspondence table generation unitcombines a plurality of continuous similarity ranges in the generatedfirst skip correspondence table into one similarity range, and assignsmost preceding data of a plurality of skip destination datacorresponding to the similarity ranges before combination, as skipdestination data corresponding to the combined similarity range, and thesecond skip correspondence table generation unit combines a plurality ofcontinuous similarity ranges in the generated second skip correspondencetable into one similarity range, and assigns most preceding data of aplurality of skip destination data corresponding to the similarityranges before combination, as skip destination data corresponding to thecombined similarity range.
 20. The data retrieval device, according toclaim 17, wherein the first skip correspondence table generation unitgenerates the first skip correspondence table only for partial data ofthe retrieval target data series, and the second skip correspondencetable generation unit generates the second skip correspondence tableonly for partial data of the retrieval target data series.
 21. The dataretrieval device, according to claim 1, wherein the data is a featurevector, and the similarity is a distance between feature vectors.
 22. Adata retrieving method, comprising, using a first skip correspondencetable that corresponds to each piece of data in a retrieval target dataseries, and, for each possible similarity range which is taken by asimilarity between corresponding data and retrieval data, records skipdestination data information for specifying the data which appears firstafter the corresponding data among pieces of data in which similaritieswith the retrieval data have the possibility to have a predeterminedrelationship in comparison with a predetermined threshold, whenretrieving data in which a similarity with the retrieval data is smallerthan or equal to the threshold from among the retrieval target dataseries, selecting data in the retrieval target data series for whichcalculation of a similarity with the retrieval data is necessary. 23.The data retrieving method, according to claim 22, wherein thepredetermined relationship is a relationship in which a similarity withthe retrieval data is smaller than or equal to the threshold.
 24. Thedata retrieving method, according to claim 23, wherein the selectingincludes, if a similarity with the retrieval data calculated for a pieceof data in the retrieval target data series is not smaller than or equalto the threshold, determining data in the retrieval target data seriesfor which a similarity with the retrieval data is calculated next,according to the calculated similarity and the first skip correspondencetable of the piece of data.
 25. The data retrieving method, according toclaim 24, wherein the selecting includes, if a similarity rangeincluding the similarity between the piece of data and the retrievaldata is present in the first skip correspondence table, determining dataindicated by skip destination data information, which is recordedcorresponding to the similarity range present in the first skipcorrespondence table, to be data in the retrieval target data series forwhich a similarity with the retrieval data is calculated next.
 26. Thedata retrieving method, according to claim 23, further comprisingreceiving the retrieval target data series and generating the first skipcorrespondence table of each data in the retrieval target data series.27. The data retrieving method, according to claim 26, wherein thegenerating the first skip correspondence table includes calculating asimilarity between subsequent data of generation target data of thefirst skip correspondence table and the generation target data,obtaining, from the similarity and the threshold, a skip possiblecondition indicating a lower limit of a similarity between thegeneration target data and the retrieval data having no possibility thata similarity between the subsequent data and the retrieval data becomessmaller than or equal to the threshold, calculating a continuous skippossible condition indicating a maximum value of the lower limit of thesimilarity provided by the skip possible conditions of the self data andsubsequent data preceding the self data, and according to the continuousskip possible condition calculated, generating a first skipcorrespondence table of the generation target data.
 28. The dataretrieving method, according to claim 22, wherein the predeterminedrelationship is a relationship in which a similarity with the retrievaldata is larger than the threshold.
 29. The data retrieving method,according to claim 28, wherein the selecting includes, if a similaritywith the retrieval data calculated for a piece of data in the retrievaltarget data series is smaller than or equal to the threshold,determining data in the retrieval target data series for which asimilarity with the target data is calculated next, according to thecalculated similarity and the first skip correspondence table of thepiece of data.
 30. The data retrieving method, according to claim 29,wherein the selecting includes, if a similarity range including thesimilarity between the piece of data and the retrieval data is presentin the first skip correspondence table, determining data indicated byskip destination data information, which is recorded corresponding tothe similarity range present in the first skip correspondence table, tobe data in the retrieval target data series for which a similarity withthe retrieval data is calculated next.
 31. The data retrieving method,according to claim 28, further comprising receiving the retrieval targetdata series and generating the first skip correspondence table of eachdata in the retrieval target data series.
 32. The data retrievingmethod, according to claim 31, wherein the generating the first skipcorrespondence table includes calculating a similarity betweensubsequent data of generation target data of the first skipcorrespondence table and the generation target data, obtaining, from thesimilarity and the threshold, a skip possible condition indicating anupper limit of a similarity between the generation target data and theretrieval data having no possibility that a similarity between thesubsequent data and the retrieval data becomes larger than thethreshold, calculating a continuous skip possible condition indicating aminimum value of the upper limit of the similarity provided by the skippossible conditions of the self data and subsequent data preceding theself data, and according to the continuous skip possible conditioncalculated, generating a first skip correspondence table of thegeneration target data.
 33. The data retrieving method, according toclaim 27, wherein the generating the first skip correspondence tableincludes combining a plurality of continuous similarity ranges in thegenerated first skip correspondence table into one similarity range, andassigning most preceding data of a plurality of skip destination datacorresponding to the similarity ranges before combination, as skipdestination data corresponding to the combined similarity range.
 34. Thedata retrieving method, according to claim 25, wherein in the generatingthe first skip correspondence table, the first skip correspondence tableis generated only for partial data of the retrieval target data series.35. The data retrieving method, according to claim 23, furthercomprising the selecting includes, in addition to the first skipcorrespondence table, using a second skip correspondence table thatcorresponds to each piece of data in the retrieval target data series,and, for each possible similarity range which is taken by a similaritybetween corresponding data and retrieval data, records skip destinationdata information for specifying the data which appears first after thecorresponding data among pieces of data in which similarities with theretrieval data have the possibility not to have the predeterminedrelationship with the threshold, selecting data in the retrieval targetdata series for which calculation of a similarity with the retrievaldata is necessary.
 36. The data retrieving method, according to claim35, wherein the selecting includes, if a similarity with the retrievaldata calculated for a piece of data in the retrieval target data seriesis not smaller than or equal to the threshold, determining data in theretrieval target data series for which a similarity with the retrievaldata is calculated next, according to the calculated similarity and thefirst skip correspondence table of the piece of data, and if asimilarity with the retrieval data calculated for a piece of data in theretrieval target data series is smaller than or equal to the threshold,determining data in the retrieval target data series for which asimilarity with the target data is calculated next, according to thecalculated similarity and the second skip correspondence table of thepiece of data.
 37. The data retrieving method, according to claim 36,wherein the selecting includes, if a similarity range including thesimilarity between the piece of data and the retrieval data is presentin the first skip correspondence table, determining data indicated byskip destination data information, which is recorded corresponding tothe similarity range present in the first skip correspondence table, tobe data in the retrieval target data series for which a similarity withthe retrieval data is calculated next, and if the similarity rangeincluding the similarity between the piece of data and the retrievaldata is present in the second skip correspondence table, determiningdata indicated by skip destination data information, which is recordedcorresponding to the similarity range present in the first skipcorrespondence table, to be data in the retrieval target data series forwhich a similarity with the retrieval data is calculated next.
 38. Thedata retrieving method, according to any of claims claim 35, furthercomprising: receiving the retrieval target data series and generatingthe first skip correspondence table of each data in the retrieval targetdata series, and receiving the retrieval target data series andgenerating the second skip correspondence table of each data in theretrieval target data series.
 39. The data retrieving method, accordingto claim 38, wherein the generating the first skip correspondence tableincludes calculating a similarity between subsequent data of generationtarget data of the first skip correspondence table and the generationtarget data, obtaining, from the similarity and the threshold, a skippossible condition indicating a lower limit of a similarity between thegeneration target data and the retrieval data having no possibility thata similarity between the subsequent data and the retrieval data becomessmaller than or equal to the threshold, calculating a continuous skippossible condition indicating a maximum value of the lower limit of thesimilarity provided by the skip possible conditions of the self data andsubsequent data preceding the self data, and according to the continuousskip possible condition calculated, generating a first skipcorrespondence table of the generation target data, and the generatingthe second skip correspondence table includes calculating a similaritybetween subsequent data of generation target data of the second skipcorrespondence table and the generation target data, obtaining, from thesimilarity and the threshold, a skip possible condition indicating anupper limit of a similarity between the generation target data and theretrieval data having no possibility that a similarity between thesubsequent data and the retrieval data becomes larger than thethreshold, calculating a continuous skip possible condition indicating aminimum value of the upper limit of the similarity provided by the skippossible conditions of the self data and subsequent data preceding theself data, and according to the continuous skip possible conditioncalculated, generating a second skip correspondence table of thegeneration target data.
 40. The data retrieving method, according toclaim 39, wherein the generating the first skip correspondence tableincludes combining a plurality of continuous similarity ranges in thegenerated first skip correspondence table into one similarity range, andassigning most preceding data of a plurality of skip destination datacorresponding to the similarity ranges before combination, as skipdestination data corresponding to the combined similarity range, and thegenerating the second skip correspondence table includes combining aplurality of continuous similarity ranges in the generated second skipcorrespondence table into one similarity range, and assigning mostpreceding data of a plurality of skip destination data corresponding tothe similarity ranges before combination, as skip destination datacorresponding to the combined similarity range.
 41. The data retrievingmethod, according to claim 38, wherein in the generating the first skipcorrespondence table, the first skip correspondence table is generatedonly for partial data of the retrieval target data series, and in thegenerating the second skip correspondence table, the second skipcorrespondence table is generated only for partial data of the retrievaltarget data series.
 42. The data retrieving method, according to any ofclaim 22, wherein the data is a feature vector, and the similarity is adistance between feature vectors.
 43. A program for causing a computerto perform, using a first skip correspondence table that corresponds toeach piece of data in a retrieval target data series, and, for eachpossible similarity range which is taken by a similarity betweencorresponding data and retrieval data, records skip destination datainformation for specifying the data which appears first after thecorresponding data among pieces of data in which similarities with theretrieval data have the possibility to have a predetermined relationshipin comparison with a predetermined threshold, when retrieving data inwhich a similarity with the retrieval data is smaller than or equal tothe threshold from among the retrieval target data series, a process ofselecting data in the retrieval target data series for which calculationof a similarity with the retrieval data is necessary, using the firstskip correspondence table.